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ABSTRACT 

Three non-equilibrium photoionisation algorithms for hydrodynamical grid-based simulation codes are compared in terms of accu- 
racy, timestepping criteria, and parallel scaling. Explicit methods with first-order time accuracy for photon conservation must use 
very restrictive timestep criteria to accurately track R-type ionisation fronts. A second-order accurate algorithm is described which, 
although it requires more work per step, allows much longer timesteps and is consequently more efficient. Implicit methods allow 
ionisation fronts to cross many grid cells per timestep while maintaining photon conservation accuracy. It is shown, however, that 
errors are much larger for multi-frequency radiation than for monochromatic radiation with the implicit algorithm used here, and 
large errors accrue when an ionisation front crosses many optical depths in a single step. The accuracy and convergence rates of the 
different algorithms are tested with a large number of timestepping criteria to identify the best criterion for each algorithm. With these 
criteria selected, the second-order explicit algorithm is the most efficient of the three, and its parallel scaling is significantly better 
than that of the implicit algorithm. The upgrade from first- to second-order accuracy in explicit algorithms could be made very simply 
to fixed-grid and adaptive mesh-refinement codes which currently use a first-order method. 
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1. Introduction 

Photoionisation is an important process in many situa- 
tions in astrophysics, being the dominant energy source 

driving the dynamics of H n regions around massive 

1969ft 



stars (e.g. Math ews & O'Dell 1 19691) . Numerical simula- 
tions of phot oionisati o n pro ceeded from the pioneering 
ID models of lLaskerl d 19661) to early 2D calculations in 
the 1980s (e.g. Hj odenhei mer. Tenorio-Tagle. & Yorkd 119791; 
ISandford. Whitaker. & Kleinlll982l) ; numerical methods and re- 
sults f rom th ese early works were reviewed in detail by 
lYorkel <fl986ft . Various degrees of approximation have been 
employed, for example assumi ng ionisation equilibrium (e.g. 
iGarcfa-Segura & Franco! 1 19961). non-equihbrium m onochro- 
matic radiation (e.g. IWhalen. Abel. & Nor man 2004), o r non- 
equili brium multi-frequency radiation (e.g. iFrank & Mellemal 
1994). 3D calculations on Cartesian gr ids became possible in the 
late 1990s dRaga et alJll999tlAbel etalll 1999ft and later on grids 
using 3D adaptive mesh refin ement (AMR) { Lim & Mellema 
120031) . 2D static nested grids dFrever. Hensler. & Yorkd 120031) 
and for 3D smoothed partic le hydrodynamics (SPH) (e.g. 



and tor 3D sm oothed particl 
iKessel-Devnet & Burkert)l2003l) 



Computing limitations have dictated that most calculations 
where the microphysics is coupled to the dynamics have used 
the on-the-spot (OTS) approximation for diffuse radiation. This 
assumes that recombinations to the ground state do not change 
the ionising radiation field because the photon produced by 
this recombination takes the place of a similar photon that 
will be absorbed rapidly to reionise the recombined atom. The 
OTS approximation therefore ignores both the change in en- 
ergy and in direction of the recombination radiation; in addi- 
tion, it becomes significantly more complicated when helium 
is also included in the model. I ts validity and e f fects are a 
subject of ongoing research (e.g. Ritzerveld 2005; Raga et all 



2009; IWilliams & Hennevl 120091) . More complicated schemes 
that do include recombination radiation are now being devel- 
oped for multi-dim ensional codes, such as th e non-ionising ra- 
diative transfer of iKuiper et al.l (1201 Oft and ICommercon et al.l 
d201Ch. i he cosmolo g ical r eionisation radamesh scheme of 
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ICantaiupo & Porcianil (1201 1[). and the Monte -Carlo photoioni 
sation algorithm of lHaworth & Harries! d2012l) . 

This work focusses on comparing methods using the OTS 
approximation, which generally fall into two major categories: 
explicit schemes requiring short timesteps and implicit schemes 
with less restrictive timesteps. These are introduced and dis- 
cussed separately in the following paragraphs. The terms "ex- 
plicit" and "implicit" here refer to the raytracing algorithm and 
hence to the inputs to the microphysics integration, not to the 
actual method used to integrate microphysical quantities within 
a cell (which is usually at least partially implicit). An explicit 
scheme therefore uses an instantaneous optical depth for the ray 
entering a cell, whereas in an implicit scheme the optical depth 
contains information from both the initial and time-advanced so- 
lution. 

1.1. Explicit timestepping schemes 

The photoionis at ion method implemented by 
IWhalen & Normaiil (I2006I) in a version of the zeus-mp 
code is representative of many algorithms commonly used 
in astrophysics fluid dynamics codes: first the timestep is 
calculated, followed by a hydrodynamics update and then 
an operator-split source term update (the order of the hy- 
drodynamics and s o urce t erm updates can be interchanged). 
Wha len & Normaiil d2006l) also include substepping of the 
chemistry reaction network within the source term evaluation 
because the chemical timescales can be much shorter than the 
dynamical timescales. For models with radiative transfer, each 
substep involves calculating the optical depth to every grid point 
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followed by a time integration of the rate equations and internal 
energy equation, and as such is first-order accurate in time 
for photon conservation. The substep timestep criterion used 
was rather stringent: Af = Q.ln e /h e , where n e is the electron 
number density and h P its partial time derivative. The method of 
Krumholz et al.l (120071) for photoionisat ion in the athena code 
uses a similar ti me-integrat i on sch eme. iMac Low et al.l d2007l) 
implemented the lAbel et al.l dl999) method in the zeus-mp code 
to study H ii regi on expansion with a simple heating and cooling 
implementation. lWise & Abel (1201 ll) describe a photoionisation 
module for the AMR co de enzq, using simil a r chem istry and 
timestepping routines to IWhalen & Normanl d2006b but with 
some modifications to make the scheme more efficient. As 
described in the literature, all of these methods are first-order 
accurate in time as regards photon conservation, and also use 
very restrictive timestepp ing criteria. 

Riikh orst et al.l d2006l) presented a photoionisation algorithm 
written for the flash code (Hybrid Characteristics - flash-hc) 
that demonstrated good parallel scaling and accuracy in calcu- 
lating optical depths. It did not restrict the timestep by the pho- 
toionisation time, thereby significantly reducing the computa- 
tional cost of a calculation, but at t he expense of pr opagating 
R-type ionisation fronts too slowly dlliev et al. 200 61. To al- 



leviate this problem in the code tests of llliev et al.l (l2006ah . an 
extra timestep restriction was imposed that was triggered by the 
presence of R-type ionisation fronts, although the form of this 
restric tion was not specified. Building on this work. lPeiters et al.l 
(2010) significantly improved the flash-hc algorithm and used 
it to model massive star formation and the growth of H n re- 
gions around young stars while they are still accretin g gas from 
their s urroundings. Although it is not described in Peter s et al.l 
d2010l) . the flash-hc integration should be second-order accurate 
in time because the source terms are evaluated both in the half 
step and the full step of the hydrodynamics update (T. Peters, 
private communication). This gives a time-centred density field 
for the raytracing in the full step update, significantly increasing 
the accuracy of the method, as will be demonstrated here. This 
scheme appears to avoid the timestepping limitations of other 
explicit algorithms by sacrificing some accuracy in the tracking 
of R-type ionisation fronts. 

Explicit schemes scale reasonably well on parallel clusters 
because so little computation is required in the raytracing step, 
but the sequential nature of the raytracing is always a limiting 
factor. For problems with a simple spherical geometry, reason- 
able parallel scaling can also be obtained b y only decomposing 
the do main along surfaces of constant angle (Wh alen & Normanl 
2006). The main limitation of explicit algorithms, however, is 
that the timesteps must be so short - for optically thick neutral 
cells an ionisation front can only cross at most a single cell per 
microphysics integration, leading to a Courant-like limit applied 
to the ionisation front velocity. 

1.2. Implicit timestepping schemes 



The C 2 -ray method (Mellem a et al.l l2006b) is an implicit ray- 
tracing algorithm that was designed to overcome the timestep- 
ping restrictions of fully explicit schemes. During the tracing 
of rays outwards from the source, cells are integrated forwards 
a full timestep and a tim e-averaged optical de pth (or attenua- 
tion fraction in the case of Mackev & Lim 2010) is passed to the 
next cell. The optical depths to every cell therefore contain in- 
formation from both the initial and final states and are hence im- 
plicit. This allows ionisation fronts to cross many optically thick 
grid cells per timestep, an impossibility for an explicit scheme. 



This method has been shown to conserve photons well and to 
track R-type ionisation fronts with the correct speed, as long as 
the timestep is limited to a fraction of the recombination tim e 
dMellema et al.ll200^|jTie7et^l200^lMackev & Limll2010h . 
It was designed primarily for calculation of the reio nisation of 
the un iverse by the first stars and protogalaxies (e.g. llliev et al.l 
2006b), but has been successfully coupled to hydrodynamics 
(HD) and magnetohydrodynamics (MHD) codes to model both 
the expansion of H n regions into turbulent density fields aroun d 
single massive stars (Mellem a et al.ll2006at Arthur et"al] 201 ll) . 



2009; 



and the photoionisation of d ense globules dHennev et al. 
iMackev & Liml 1201 Oi 1201 lb . The only real weakness of this 
method as a time integration scheme is that the microphysics 
integration happens during the raytracing step, which must be 
performed in sequence outwards from the sources and hence has 
rather poor scaling properties on distributed memory computing 
clusters. It is therefore more suited to shared memory systems. 

The main aim of this work is to compare the accuracy and 
parallel scaling of explicit and implicit schemes implemented 
with the same code and to identify a sufficient timestep criterion 
for explicit schemes to accurately track R-type ionisation fronts. 
The code and algorithms used here are described in Sect. [2] 
The accuracy of the three algorithms is compared in Sect. [3] 
for the case of monochromatic ionising radiation, and in Sect. |4] 
for multi-frequency ionising radiation with frequency-dependent 
optical depths. The parallel scaling of the algorithms is assessed 
in Sect.|5]for static and dynamical situations. The results are dis- 
cussed and conclusions are summarised in Sect. [6] 



2. Algorithm implementation 

The algorithms tested here have been implemented in the ray- 
tracing/photoi onisation/magnetohyd rodynamics (R-MHD) code 
described in IMackev & Liml d2010l 1201 ll) . to which the reader 
is referred for further details. It is a finite-volume, uniform-grid 
code written in C++ and parallelised by domain decomposition, 
with communication of internal boundary data through the mes- 
sage passing interface (MPI). The equations of inviscid com- 
pressible HD or ideal compressible MHD are solved on a uni- 
form fixed grid in 1-3D; here a spherically symmetric ID grid, 
a plane-parallel ID grid, an axisymmetric 2D grid in (z,R), and 
a 3D Cartesian grid are used. These equations are supplemented 
by a microphysics integrator for the ion fraction of Hydrogen, y, 
and a raytracer to calculate column densities from either point 
sources or from sources at infinity. A tracer variable is used for 
y (and any other species to be integrated); this tracer is passively 
advected with the flow and also has creation (ionisation) and 
destruction (recombination) source terms. Microphysical (radia- 
tive and collisional) heating and cooling processes also provide a 
source term to the energy equation. For HD the equations solved 
are as follows (the conservation of mass, momentum, energy, 
and H + ions, respectively): 



dp 
dt 



+ V • [pv] = 



+ V • [V ® pv] + Vp R = 

at 

8E 



— + V ■ {v [E + Ps ]} = T(p,y,NmNm) - A(p,y, T) 



p + v ' [pyy] ] = A ^y' Nm ^ [1 - y ]+ 



A d {T)n H y[l-y]-al(T)n H y 2 . (1) 
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Here the gas density, pressure, velocity, and total energy density 
are [p, p g , v, E] respectively, where E = E- mt + 0.5pv 2 is the sum 
of internal and kinetic energy densities. For a gas with constant 
adiabatic index y, we have E[ nt = p g /[y- 1]. The totalHnumber 
density is tin = p/(2.4 x l(T 24 g). T and A are the heating and 
cooling rates per unit volume in the cell (ergcirT 3 s _1 ), respec- 
tively. A is a function of p, y, and the gas temperature T, while 
T depends also on the column density of H nucleons, Ah, and 
of neutral H, Aho- The collisional ionisation (A c j) and Case B 
radiative recombination (o^.) rates are functions of T and are in 
units of cm 3 s _1 ; the photoionisation rate (A p ;) is a function of 
(p,y,Nno) and distance from the source, with units of s _1 . 

The homogeneous parts of these equations are integrated us- 
ing a directionally unsplit, second-order (in time and s pace), 
finite volume formulation described fo r axi symmetry by iFalld 
dl991l) and for Cartesian geometry by iFalle et all d 19981) . The 
scheme for spherical coordinates in ID is a trivial modi- 
fication of the axi symmetric algorithm; some results from 
iBoss & Mvhiil dl992h were used for the second-order recon- 
struction. Microphysical source terms are then solved by oper- 
ator splitting using one of three possible algorithms, described 
in the following subsections. Photoionisation and ionisation- 
heating rates require the optical depth and distance from any 
radiation sources to the cell in question. This is calculated us- 
ing a short characteristics ray-tracing module with the interpo- 
lation weighting scheme advocated by iMellema et al.1 d2006b) 
(this is more accurate than the weighting proposed in appendix 
B of iRiikhorst et al.ll2.QQ6l) ; diffuse radiation is treated approxi- 
mately by the OTS approximation. In the microphysics update 
the source terms are integrated, giving the following ODEs: 

y=A pi (p,y,N m )[l - y] + A ci (T) nB y[l - y] - a%{T)nuy 2 
E mt = T{p, y, A H , Aho) - A(p, y, T) . (2) 

Here the density is constant for each cell so temperature is a 
function only of E m and y. All algorithms described below use 
the same microphysics integrator and heating and cooling rates 
to enable a fair comparison between models. Variables are inte- 
grated in time using backward differencing with Newton itera- 
tion, implemented with t he cvode solver of the Sun dials numer- 
ical integration library dCohen & Hi ndmarsh 1996). 

The heatin g and cooling functions use either the 
Mac kev & Liml (1201 Oh model C2 o r the much more de- 
tailed model of iHennev et al.l (|2009), which was calibrated 
using a dedicated photochemistry code (although here their 
X-ray heating term is omitted). The more detailed model en- 
ables the inclusion of multi-frequency photoionisation sources 
(to model the spectral hardening of radiation with optical 
depth) and heating due to far-ultraviolet (FUV) non-ionising 
stellar radiation, both of which have a significant effect on 
photoionisation simulations. It also provides a more realistic 
cooling function for dense neutral gas, although the details of 
the cooling physics are not so important for this work. The 
code can be switched by a compile flag to use either the C2 
heating/cooling function with monochromatic radiation, or the 
more detailed heating/cooling with multi-frequency radiation. 
The multi-frequency photoionisation and photo-heating rates 
are pre-calculated for a given source spectru m and tabulated as a 
functio n o f optical depth as descr ibed in e.g. iFrank & Mellemal 
dl994l) and lMellema et all ( l2006bl) . 



2. 1 . Implicit algorithm 

The raytra cing/microphysics scheme used in Mac kev & Lim 
f 201(a [2O1H ) is a variant of the C 2 -ray algorithm (Mel lema et al. 
2006b), and will be referred to here as Algorithm 1 (or sim- 
ply Al). Some improvements have been made to the algorithm, 
so it is described again here. The algorithm has two interfaces 
with the main simulation code: one for calculating the simula- 
tion timestep and one for updating the microphysical quantities. 
In each timestep, first the timestep At is calculated, then the com- 
bined raytracing and microphysics update of the internal energy 
density (Emt) an d neutral fraction (1 -y) is performed, followed 
by a second-order-accurate dynamics update. The timestep cri- 
teria are discussed in more detail below, but a basic require- 
ment for accurate tracking of R-type ionisation fronts is that the 
timestep must be limited to a fraction of the recombination time, 
^ = l/a£n H , (Me llema et"aT1l2006bl) . 

For Al the two source term integrations defined by 
Equations [2] are supplemented by integrating the attenuation 
along the ray segment passing through the cell as described in 
iMackev & Liml d2010b . This allows the calculation of a time- 
averaged attenuation fraction, which can be converted to a time- 
averaged column density. The integration is performed at the 
ionisation threshold hv^ = 13.6 eV, and the time-averaged at- 
tenuation fraction of photons at v = vo is then 

2 pt+ht 

</vo>=^J exp[-AT V0 (W , (3) 

where the cell optical depth, At v = rtuoAscr v is the product of 
the neutral H number density, the ray segment length As, and 
the photoionisation cross-section cr v . This time-averaged atten- 
uation fraction is converted to a time-averaged column density 
and used to calculate the column density to the next cell further 
from the source. 

It was found to be more numerically stable to integrate the 
neutral fraction than the ion fraction, so the three variables inte- 
grated are (1 - y, E mt , exp[-AT Vo ]). The variables are integrated 
using cvode with a relative error tolerance of 10~ 4 and with an 
absolute error tolerance of 10~ 12 , 10~ 17 , and 10~ 3() , respectively. 
This is a more accurate int egration scheme than that used in 
Mac kev & Liml (1201 Oi 1201 ll) and is consequently more compu- 
tationally expensive. 

2.2. Explicit algorithms 

Two explicit integration algorithms have been implemented, the 
first of which is similar to previously published methods (see 
section [TJ. Algorithm 2 is a replacement of Al with a new 
timestep criterion and m icrophysics integration, similar to the 
IWhalen & Normanl (120061) algorithm, but without substepping. 
It is again fully operator-split from the dynamics and a timestep 
proceeds as follows: 

1 . Rays are traced to calculate neutral (and optionally total) col- 
umn densities to each cell. 

2. Microphysical and dynamical timesteps are calculated; the 
minimum over all cells is used. 

3. Microphysical quantities are integrated from t — > t + At using 
the instantaneous column densities. 

4. Using this intermediate state as a starting point, a second- 
order dynamics update is performed over the time interval 
At. 

The key difference between A2 and Al is that instantaneous col- 
umn densities are used from the beginning of the timestep for 
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Algorithm 3 



Trace rays to get N(H), N(H0) at time t 



Calculate timestep At 



Half step from t to t+At/2 



5 



Update microphysics to t+At/2 



Update dynamics to t+At/2 (1st order) 



Full step from t to t+At 



5 



Trace rays: N(H), N(H0) at time t+At/2 



Update microphysics to t+At 



Update dynamics to t+At (2nd order) 



Increment time: t=t+At 



Fig. 1. Sequence of calculations for a single timestep for al- 
gorithm 3, which traces rays twice per step: the time-centred 
second raytracing makes photon conservation second-order in 
time, thereby allowing a less restrictive timestep and conse- 
quently proving to be more computationally efficient than algo- 
rithm 2. Second-order accurate microphysics substeps could use 
the same algorithm, but omitting the dynamics updates. 



the integration of the microphysics equations. This allows the 
raytracing step to be separated from the microphysics update, 
which has parallel scaling advantages already discussed. The 
microphysics integration is exactly the same as for Al except 
that the time-averaged attenuation fraction is not needed, so only 
two variables are integrated, i.e. (1 - y,E[ nt ). It was found that 
an accurate solution with A2 required very restrictive timesteps, 
which would make multi-dimensional calculations very compu- 
tationally expensive (timestepping criteria are discussed in the 
next section). This is because the optical depths are not time- 
centred in a timestep and so photon conservation is first-order 
accurate in time, making it very difficult to accurately track R- 
type ionisation fronts. 

Algorithm 3 is a modification of A2 in which raytracing is 
performed twice per step, once at the beginning to calculate the 
timestep, and secondly using the time-centred half-step density 
(and ion fraction) field. The sequence of calculations for A3 is 
shown in Fig. Q] The second raytracing sets the optical depths 
used for the full step microphysics update and leads to second- 
order accurate photon conservation. While it requires signifi- 
cantly more calculation per step compared to A2 (two raytrac- 
ings and source term integrations per step), the higher order of 
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Table 1. Timestepping criteria used for test simulations. 
Constants K\ - K4 are timestep-limiting factors for four crite- 
ria as discussed in section 1231 Entries with Ki = 00 indicate that 
the criterion is not used. 



accuracy leads to a much more computationally efficient inte- 
gration, as will be shown in the next section. During the writing 
of this paper it was discovered (T. Peters, private communica- 
tion) that A3 is similar to the time update used in the upgraded 
version of flash-hc, although here the timestepping criterion has 
been varied to track R-type ionisation fronts accurately. 



2.3. Timestepping criteria 

Previous authors using algorithms similar to A2 and A3 have 
employed a wide variety of different timestepping criteria, so an 
attempt is made here to identify a sufficient criterion for each 
algorithm. For Al the timestep should be a fraction of the re- 
combination time, so a constant K\ is defined by At = K\t KC . A 
timestep limited by the relative change in energy is also consid- 
ered (Af = K2E mt l\E mi \), and by the relative or absolute change 
in y (At = K3 max(0.05, 1 - y)/\y\ and At = Ki/\y\, respectively). 
If all four criteria were used together, the microphysics timestep 
limit would be given by 

/ E int max(0.05, 1 -y) 1\ 
Af = min AUec, K 2 -^\, K 3 V — - ^, A4- . (4) 

\ \E int M Ml 

The constant 0.05 in the third criterion is required to prevent the 
timestep from becoming very small when a cell approaches full 
ionisation. Different combinations of these criteria are assigned 
an ID number and listed in Table Q] For criteria dt00-dt04 the 
explicit integration is limited only by the absolute change in y, 
and the implicit algorithm only by the recombination time. For 
criteria dt05-dt08 all algorithms are limited additionally by the 
fractional change in energy, and the implicit algorithm by the 
absolute change in y rather than the recombination time. For cri- 
teria dt09-dtl2 the limit on the absolute change in y is replaced 
by a limit on the relative change in (1 - y), i.e. the neutral H 
fraction. 



3. Ionisation fronts for monochromatic radiation 

T he accuracy of the pred ecessor of Al was extensively tested 
in Mackev & Lim (2010) for monochromatic radiation, and the 
C 2 -ray algorithm to which it is closely related has been tested 
and used successfu lly in many calculations (e.g. llliev et ali 2009: 
lArthur et al.ll201 lb . Most emphasis in this section has therefore 
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Fig. 2. Velocity errors as a function of timestep criterion (x-axis 
integers correspond to criteria dt00-dtl2) for a ID plane-parallel 
ionisation front propagating through a static density field with 
no recombinations for algorithms Al (red), A2 (green) and A3 
(blue). 



been devoted to testing the explicit algorithms A2 and A3. It 
is important to separate raytracing (spatial discretisation) errors 
from time integration errors, so tests were performed on a ID 
grid, initially in plane-parallel geometry and then in spherical 
geometry. 

3.1. Plane-parallel radiation without recombinations 

The first test is a basic test of photon conservation: a ID slab- 
symmetric grid was set up with 100 cells spanning 0.51 pc. 
A source was placed at x = —oo with monochromatic ionising 
photon flux F y = 10 9 cm _2 s _1 and photon energy 18.6 eV, ir- 
radiating a neutral uniform density field with «h = lOOcrrT 3 . 
This resulted in an ionisation front propagating with a constant 
velocity of vrF = lOOkms -1 through a grid with cell optical 
depths At ^ 10 for a (constant) photoionisation cross-section 
<x = 6.3 x 10~ 18 cm 2 . The cross-section was set to be frequency- 
independent for the monochromatic radiation tests; this was re- 
laxed later for multi-frequency radiation. For the monochromatic 
tests in this section the only significance of the cross section 
is to set the cell optical depth, so it is unimportant that the ac- 
tual cross section at 18.6 eV is somewhat smaller than the value 
used. Recombinations, collisional ionisation, and dynamics were 
switched off, so for a perfect integration the number of ions per 
unit area on the grid is Ni(f) — F y xt. This is a relatively simple 
problem for Al, but for A2 and A3 the accuracy depends cru- 
cially on the timestep criterion because photon attenuation is not 
averaged over a timestep. 

The results are shown in Fig. [2] for all three algorithms for 
all timestep criteria (dt00-dtl2). The relative error is plotted on 
the y-axis as a function of the timestep criterion. Relative error is 
defined here as the fractional difference between the number of 
ions and the number of neutrals, which for this problem corre- 
sponds to the fractional error in ionisation front velocity. There 
are no recombinations, so the fractional error stays roughly con- 
stant for the full simulation, and the plotted values are the mean 
values for each simulation averaged over many timesteps. For 
Al the error is very small for all timestep criteria, as expected 
for an algorithm specifically designed to conserve photons. It is 
surprising that the error is significantly smaller even than the rel- 
ative error criterion of the microphysics integrator (10~ 4 ). 



For A2 and A3 the error is very dependent on the timestep 
criterion, and the much more rapid convergence of A3 with de- 
creasing timestep is clearly shown. The three different types of 
criteria can be clearly identified in the A2 curve: dt00-dt04 have 
the accuracy increasing by roughly a factor of 2 each time, dt05- 
08 have the same with a smaller initial error, and dt09-dtl2 have 
a similar error. For A2 the convergence is linear, as expected, 
but the error is large for all timestep criteria, being significantly 
less than 1% only for dt08 and dtl2. With A3, by contrast, dt02 
already gives less than 1 per cent error, and convergence is ba- 
sically quadratic. For timestep criteria dt05-dtl2 the accuracy is 
within a factor of 10 of the relative error tolerance of the micro- 
physics integrator, so any trends in the error are not significant. 

At first glance it seems surprising that A2-dt05 is more ac- 
curate than A2-dt04, which has by far the smaller value of K\. 
The reason is that dt05 also limits Af by the relative change in 
energy, so in fact they take almost the same number of timesteps. 
The internal energy of a cell increases by a factor of about 320 
(from neutral gas with T 50 K to ions plus electrons with 
T ~ 8000 K) and K 2 = 0.5 allows a 50% increase in -cj n t per 
step, meaning about 14 timesteps are required to go from neutral 
to ionised. For dt04 (with K\ = 1/16) about 16 timesteps are 
required to ionise a cell, comparable to dt05, and so we expect 
the two criteria to have a similar accuracy. Indeed, a log plot of 
the number of timesteps taken for each criterion with A2 shows 
approximately the inverse (with arbitrary normalisation) of the 
fractional error plotted in Fig. [2] 

This is a scale-free problem, so the relative error 
should be independent of ionisation front velocity, and 
this has been verified numerically for velocities of vjp = 
[10,30,100,300, 1000] km s -1 . Of course in a dynamical cal- 
culation the Courant condition provides an upper limit to the 
timestep, which will reduce the error for slowly moving ionisa- 
tion fronts. For D-type ionisation fronts (subsonic by definition) 
the Courant condition automatically imposes K4 < 1 . The sim- 
ulations were also repeated for densities of nu = lOcrrT 3 and 
«h = 1000 cm 3 while keeping the cell optical depth constant, 
and the results are almost indistinguishable. The effects of cell 
optical depth on the accuracy of the three algorithms is studied 
in more detail for multi-frequency radiation in section|4] 

3.2. Point source radiation in spherical symmetry 

Here the spherically symmetric expansion of a Stromgren sphere 
in a static medium is calculated and results are compared to the 
analytic solution. A ID spherical grid was set up with 3840 cells 
uniformly covering the range re [0, 1 .94] pc with a constant 
gas density of «h = lOOcirT 3 and a temperature T = 50K, giv- 
ing a cell optical depth to ionising photons of At ^ 1 . A radia- 
tion source was placed at the origin with monochromatic ionis- 
ing photon luminosity N - 10 48 s and photon energy 18.6 eV. 
The recombination rate of H + was set to be independent of tem- 
perature, thereby enabling straightforward comparison with the 
analytic solution. Results using densities lOx lower and higher 
(with associated lower and higher source luminosities) gave al- 
most indistinguishable results. For simulations with fewer grid 
cells and correspondingly higher cell optical depth the results 
are also almost indistinguishable for Al, whereas A2 and A3 
become slightly more accurate. The simulations were again run 
for all 12 timestep criteria and all three algorithms. 

Sample results for the Stromgren sphere expansion are 
shown in Fig. [3] for the ratio of the actual (/? a ) to theoretical 
{Rif) radius given by Rn(t) - Rs[l - exp(-f/f rec )]'^ 3 , where the 
Stromgren radius R$ is here equal to R$ = 1.42 pc. Results 
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for dt00-dt04 are shown for Al and A3, whereas dt05-dt08 
are shown for A2 because the results for dt00-dt04 had much 
larger errors (even dt04 had a 2% error). For this calculation, Al 
reaches 0.3f rec in a single step for dtOl with an error in position 
of < 4 per cent, decreasing to < 1.5 per cent for Af = 0.1f rec 
(dt02). Even at t — 0.1f rec the ionisation front has crossed many 
cells in a single step with dt02, and the error is smaller than that 
obtained using A3 with dtOl, which takes two timesteps for ev- 
ery grid zone the ionisation front crosses. This shows the huge 
efficiency advantages of Al in tracking R-type ionisation fronts 
with reasonable accuracy. If an error of < 1 - 1.5% is the max- 
imum allowed, then dt07 is the first timestep criterion that is 
accurate enough with A2. With A3 dt02 is already more than ac- 
curate enough. The three simulations with similar accuracy (dt02 
for Al, dt07 for A2, and dt02 for A3) took 100, 22661, and 2923 
timesteps, respectively, to reach t = 10f rec - In terms of runtime 
(not counting initialisation and data I/O), the respective runtimes 
are 12.6, 180, and 65.7 seconds, the differences in runtime be- 
ing smaller than suggested by the number of timesteps because 
shorter timesteps have a less costly microphysics integration. 

For this problem most of the error accrues at the early stages 
of expansion (t < 0.1f rec ) when the ionisation front is R-type 
and a 10 per cent velocity error can rapidly become a large posi- 
tion error. The error of course decreases for all timestep criteria 
when the steady state is approached after a few recombination 
times. The steady state does not correspond exactly to the ana- 
lytic solution because the ion fraction is not a perfect step func- 
tion in radius - instead there is a small neutral fraction within 
the H n region that increas es as the radius approaches Rs (see 
e.g. lPawlik & Schavdl2008l) . This leads to an equilibrium radius 
slightly larger than Rs , with the difference depending on the ion- 
ising spectrum. For the monochromatic radiat ion used here and 
the ot her parameters given above, eq. 33 in iPawlik & Schavd 
(2008) can be solved to show that the equilibrium radius with 
y = 0.5 is r = 1.004/?$, almost exactly the value that the sim- 
ulations relax to at t — 10f rec - Note that the fractional errors in 
Fig.[3]are smaller than for Fig.|2]because here the radius corre- 
sponds to the cube-root of the number of ions. There are other 
physically motivated cases, such as a 1/r 2 density field, where 
the ionisation front remains R-type for much longer, and in this 
situation the ionisation front position errors would continue to 
increase with time. 

4. Ionisation fronts with multi-frequency radiation 

A more realistic model of propagating ionisation fronts is ob- 
tained by considering a spectrum of ionising photons and in- 
cluding the frequency-dependent photoionisation cross-section 
of H. The source spectrum is modified as the optical depth in- 
creases so there is no guarantee that the results obtained with 
a monochromatic source will still hold with a multi-frequency 
source. To test this, the same calculation as in section 13.21 was 
performed using an ionising source spectrum with a blackbody 
temperature T = 37 500 K and normalised to have the same ion- 
ising photon luminosity (N = 10 48 s _1 ). The recombination rate 
was allowed to vary with temperature so the actual Stromgren 
radius is not exactly the same as before. The number of grid 
zones was varied to give simulations with cell optical depths (at 
v = vo) of Ato - [1,3, 10,30] to study the effects of resolu- 
tion on the solution obtained. Additionally, the number density 
of the ISM was varied (with an accompanying change in source 
luminosity to give the same ionisation front expansion velocity) 
with n H = [10, 100, 1000] cm 4 ]. While the number density had 
almost no effect on the solution accuracy, it did affect the code 
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Fig. 3. Expansion of a Stromgren sphere in ID for the three 
algorithms: the implicit Al (top), first-order explicit A2 (centre), 
and second-order A3 (bottom), with different timestep criteria as 
indicated. The ratio of actual to theoretical ionisation front radius 
is plotted on a logarithmic time axis from t = 0.001f rec for A2 
and A3, and from 0.1f rec for Al (because the timestep criterion 
is much less restrictive for Al). The different convergence rates 
for A2 and A3 are again apparent. 



efficiency for Al with dt00-dt04 because the recombination time 
scales inversely with density. 

This gives a grid of 12 simulations, each to be run with 3 
different algorithms, each using 12 different timestep criteria. 
There are two main considerations for each calculation: the ac- 
curacy compared to the most accurate solution, and the runtime. 
The best combination of algorithm and timestep criterion will 
be that which achieves a certain required accuracy with the least 
computation, with the overall restriction that the computation re- 
quirement is not prohibitive. 
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Fig. 4. Expansion of a Stromgren sphere in ID using Al with an ambient gas density of «h = 10 cm (results for higher densities 
are almost indistinguishable). The position of the cell with the steepest radial gradient in H + fraction is plotted as a function of 
time for simulations with cell optical depth Ar - 1 and dt00-dt04 (top left), At - 3 and dt00-dt04 (top right), Ar - 10 and 
dt00-dt04 (centre left), Ar - 30 and dt00-dt04 (centre right), Ar - 10 and dt05-dt08 (below left), and At 30 and dt05-dt08 
(below right). The equivalent plots for dt09-dtl2 all show results indistinguishable from the dtl2 curve. The trend for decreasing 
accuracy with increasing cell optical depth (i.e. decreasing numerical resolution) for criteria dt00-dt04 is clearly seen in the first four 
panels; this is corrected by the criteria dt05-dt08 that also limit the timestep by the relative change in internal energy. Discreteness 
in the computational grid and the output frequency account for the non-smooth curves in the plots for Atq ^10 and 30 (also in the 
following figures). 



4.1. Algorithm accuracy 

The most basic property of the simulation is the location of the 
ionisation front as a function of time. This was calculated by 
finding the cell with the largest second-order radial gradient in 
the H + fraction defined by 

dy I y(n+i) - y(n-\)\ „ . n „ 

max— = max V z e (l,Ni - 2) , (5) 

or \ r i+ i - n-\ j 

where there are N-, grid zones and i is zero-offset. This produces 
almost identical results to other criteria, e.g. the first cell with 



y < 0.5, although as the ionisation front reaches the Stromgren 
radius at t > f rec the cell with the steepest gradient can occasion- 
ally retreat/advance by one cell as the ionisation front relaxes to 
equilibrium. Simulations with higher cell optical depths neces- 
sarily have fewer and larger cells; grid discreteness effects are 
clearly seen in Figs. @]-|6] for models with cell Atq 30. The 
ionisation front radius as a function of time is shown for rep- 
resentative simulations run with Al in Fig. [4] and with A2 and 
A3 in Fig. [5] Most panels show results for dt00-dt04 compared 
to the A3 result for dtl2 (the most accurate run) because dtOO- 
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Fig. 5. As Fig.|4]but for A2 and A3. The panels show results for «h = 10 cuT 3 ; results for other ambient densities are indistinguish- 
able, and the accuracy of all criteria increases somewhat with increasing cell optical depth (i.e. decreasing numerical resolution). 
The panels show results for A2 with cell optical depth Ato =* 1 and timestep criteria dt00-04 (top left), dt05-dt08 (top right), and 
dt09-dtl2 (centre left), and for A3 with dt00-dt04 and Ar - 1 (centre right), Ar 3 (bottom left), and Ar - 30 (bottom right). 
In all plots the reference result in the heavy black line is from A3 with dtl2. 



dt04 use the least restrictive timestep criteria and hence have the 
largest errors. Fig. [4] shows that for Al the best solution is ob- 
tained for low cell optical depths, and the error increases steadily 
with optical depth, to an error in ionisation front radius of about 
10-15% in the worst case. Data exist for dt00-dt03 at t ~ 0.01f rec 
with Al because At is also limited by the Courant condition, 
and in addition, the first timestep is artificially set to be very 
short. The results show, in contrast to the monochromatic radia- 
tion results, that timestep-limiting based only on the recombina- 
tion time (dt00-dt04) is not reliable for very optically thick cells 
with multi-frequency radiation. Timestep-limiting using the rel- 
ative energy change (dt05-dt08) is much more stringent for the 
early expansion of the H n region and hence provides an accurate 



solution. Limiting additionally by the relative change in neutral 
fraction (dt09-dtl2) provides little extra benefit. 

In contrast to Al, algorithms A2 and A3 are less accurate 
at lower cell optical depths. The first three plots in Fig. [5] show 
results for A2 equivalent to those for Al in Fig. |4] except that 
here only the worst case is shown with cell Ato — 1. Curves 
for all timestep limiters are also shown, and it can be seen that 
dt03, dt04, dt07, dt08, and dt09-dtl2 provide adequate fits, but 
only dtl0-dtl2 are properly converged to the solution obtained 
with A3. For A3 only results for dt00-dt04 are shown in the last 
three plots of Fig. [5]because all of dt06-dtl2 are indistinguish- 
able from each other on this plot for all densities and cell Atq 
values. Only dtOO is a noticeably bad solution with A3. 
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Fig. 6. Plots of the radial profile of the H + fraction at t — f rec for Al (first three plots) and A3 (last three plots). Results for 
«h = lOcrrT 3 are shown (results for higher densities were indistinguishable) for Al with cell optical depths Atq st 1 (top left), 
Ato — 10 (top right), and Aro =* 30 (centre left), and the equivalent plots for A3 are centre right, bottom left, and bottom right. For 
Al and A3 the criteria dt06-dtl 1 all provide good fits, very close to the dtl2 results. As in previous figures, the discreteness of the 
grid is seen in the Atq - 30 plots. 



Fig.|6]shows the radial profile of the H + fraction at t — f rec for 
Al (above) and A3 (below) for timestep criteria dt00-dt04, com- 
pared to the numerically converged result from dtl2. Al over- 
predicts the ionisation front location for low time-accuracy (as 
seen already in Fig. @}, indicating that the attenuation of pho- 
tons is less than it should be. A3, by contrast, over-attenuates 
photons for low time-accuracy because it uses instantaneous col- 
umn densities. The results for A2 are similar to A3, but the er- 
rors are much larger, and the solution converges more slowly 
because it is a first-order method. For Al the dtOO and dtOl re- 
sults are almost identical, and dt02 also for the Aro - 1 simu- 
lation, because the Courant timestep condition is more restric- 
tive than the recombination time (the CFL number was set to 
1.0 for these tests). The accuracy is therefore somewhat better 



than would be obtained without the Courant condition. When 
the cell optical depth is low it is clear that Al is the most accu- 
rate algorithm, but the accuracy decreases severely for very opti- 
cally thick cells using only the recombination times as a limiter 
(dt00-dt04). Models dt06-dtl2 all produce results similar to A3 
with dt06-dtl2. A3 actually gets more accurate for higher opti- 
cal depths using dt00-dt04, and it can be seen that dt02 provides 
a good solution in all cases. Errors in gas temperature are in all 
cases identical to errors in ion fraction, since both quantities are 
integrated to the same relative error tolerance. 

The reason for the errors in Al can be traced back to its ori- 
gin as an algorithm for monochromatic light. In that case the 
spectrum cannot change with optical depth and there is a one-to- 
one correspondence between column density and the fractional 
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attenuation of ionising photons. Once a multi-frequency source 
is used, however, the fractional attenuation is a function of the 
incident spectrum and of the column density, both of which can 
change with time, meaning that the one-to-one correspondence 
is no longer so clear. When the cell optical depth changes sig- 
nificantly during a timestep, the radiation spectrum must also 
change significantly, so a time-averaged column density at a spe- 
cific frequency will no longer give an accurate value for the time- 
averaged photon attenuation fraction. It is likely that a more ac- 
curate algorithm could b e devised, and indeed it is possible that 
the C 2 -ray algorithm of Mel lema et al.l (2006b) is already more 
accurate for this problem than Al. 

4.2. Algorithm efficiency 

All algorithms provide a numerically converged solution us- 
ing the timestep criterion dtl2, but this is unfeasibly restric- 
tive for most probl ems. The cases dtll an d dtl2 bracket the 
cri terion used by IWhalen & Normanl d2006|) and more r ecentl y 
by IWise & Abell (1201 lb but, as noted bv lwise & Abell d20lTI) . 
a more efficient algorithm is desirable. The data obtained here 
allow comparison of the numerical algorithms in terms of both 
speed and accuracy, which is shown in Fig. [7] by plotting the 
LI error of the solution as a function of runtime (as a proxy for 
computational expense) at t — f rec . The runtime of the simula- 
tions was calculated using a microsecond timer that starts once 
the code enters the main timestepping loop and stops when the 
code exits this loop. The only data output during this time is 
for log-files (which is buffered) and, in addition, the calculations 
were run on a multi-core computer ensuring that at least one core 
was always idle. The runtimes of some simulations are so short, 
however, that their accuracy needed verification. To this end the 
code was instrumented with the Callerind tool of the Valgrind 
profiling and debugging software suitqj This tool counts the in- 
structions passed to the CPU and the results obtained were in- 
distinguishable from Fig.|7]but with a rescaled x-axis, demon- 
strating that the runtimes in the figure are reliable. The LI error 
is here defined as the mean error per grid cell: 



Error = — ^ \y t - y iM \ , (6) 



where y, re f is the reference solution at cell i of Ni cells, taken 
as the A3-dtl2 solution. Note that this is not a relative error, so 
most of the contribution is from cells near the ionisation front 
where the differences in y can be of the order of unity. 

In all cases the A3 results lie below the A2 results, and the 
higher rate of convergence is also apparent, especially for cal- 
culations with high optical depth cells. For calculations with 
Ato — 1 Al is more efficient than A3 using dt00-dt05, but these 
timestep criteria are not sufficiently accurate for Ato > 10, and 
the timestep criteria dt06-dtl2 are always more expensive with 
Al than A3. The convergence between the solutions for Al and 
A3 levels off for the Ato < 3 simulations (first four panels) at a 
relative difference of about 10~ 3 . The relative error tolerance in 
the microphysics integrator is set to 10~ 4 so it is not surprising 
that mean differences of < 10~ 3 are found. For the same rea- 
son the fact that the solutions agree more closely (to 10~ 4 ) for 
Atq — [10, 30] is probably not significant. 



1 http://valgrind.org 



4.3. Optimal timestep criteria 

The best timestep criterion for each algorithm is somewhat sub- 
jective, depending on what one considers to be an acceptable 
level of error compared to a fully converged solution. Errors 
from discretisation in multi-dimensional simulations are gener- 
ally ~ 1 % so it seems reasonable to set the accuracy requirement 
at around this level. In this case dt05 is sufficient for Al in al- 
most all situations, limiting the timestep by the relative change 
in internal energy and absolute change in y. It also avoids the 
limitations of dt00-dt04, which limit At by the recombination 
time even for an equilibrium situation where neither y nor E mt is 
changing. On the other hand, with dt05 an ionisation front takes 
a number of timesteps to cross a single cell, taking away the pri- 
mary advantage Al has over A3. 

For A2, dt08, dtl 1 or dtl2 give an acceptable level of accu- 
racy in all situations, in agreement wi th the assessment of pre - 
vious authors using similar criteria dWhalen & Norm an 2006). 
Fig.|7]shows, however, that it is almost always the least efficient 
algorithm, and that A3 is a much better explicit integrator. With 
A3, dt02 is already a good enough solution in all cases. It is gen- 
erally more accurate than dt05, although dt05 is superior for very 
optically thick grid cells. Any of the criteria from dt02-dtl2 are 
acceptable for A3, and if efficiency was not an issue dt03 or dt06 
would be preferable. 

5. Parallel scaling 

If we use the timestep criteria suggested in the previous section, 
then A3-dt02 is already the most efficient algorithm when run on 
a single core. A3 should also scale efficiently to a larger num- 
ber of cores because there are fewer calculations in the poorly 
scaling raytracing step. In this case the motivation for compar- 
ing the scaling of the algorithms is not so much to demonstrate 
the advantage of one algorithm over the other, but rather to test 
the scaling of each algorithm individually. It should be borne in 
mind that the overall algorithm efficiency is also strongly depen- 
dent on the microphysics integration algorithm, and a specially 
tailored scheme for Al and A3 could probably be made more 
efficient than the generic backward-differencing integrator used 
here. As an example, it is possible that the C 2 -ray method of 
Mell ema et al.l (2006b) is more efficient than the (similar) im- 
plementation used here as Al and, if so, would have a lower 
normalisation on the following plots. Raytracing here uses the 
short characteristics tracer that scales reasonably well on paral- 
lel architectures; the scaling may be different for other raytrac- 
ing algorithms such as long characteristics which concentrate 
many rays near the source. The following tests were performed 
on the JUROPA computer at the Jiilich Supercomputing Centre 
in Germany. 

The parallelisation strategy of the code is quite simple: the 
simulation domain is recursively divided into two n times with 
the division being along the axis on which subdomains have the 
largest number of cells, resulting in N = 2" subdomains of equal 
size that are as close to cubic as possible. The radiation source 
is moved to the nearest cell vertex for raytracing purposes; its 
position can be chosen so that this vertex also lies on a sub- 
domain vertex in which case quadrants/octants of the domain 
can be traced independently. Each subdomain makes (for each 
source) a list of domains closer to the source from which it needs 
to receive boundary data column densities, and another list of the 
domains to which it has to send boundary data. The boundary 
cells containing data to be sent and received are put into linked 
lists of pointers to cells for each boundary to be sent. In the cur- 
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Fig. 7. Accuracy as measured by the LI error as a function of simulation runtime in seconds for the different algorithms and 
timestep criteria, for simulation outputs at t — f rec . The first three panels show results for the simulations with cell A to - 1 
and ambient densities «h = lOcnT 3 (top left), lOOcirT 3 (top right), and 1000 cm -3 (centre left). The next three show results for 
simulations with «h = lOcnT 3 and cell Aro - 3 (centre right), Ato - 10 (bottom left), and Aro - 30 (bottom right). Each point 
represents a timestep criterion with Al in red, A2 in green, and A3 in blue. The points are numbered according to the timestep 
criteria (Al in red, A2 in black, A3 in blue), although not all numbers are readable due to overcrowding. 



rent implementation the subdomain face, edge, and corner data 
are sent separately, but they could (more simply) be sent together 
and this will be upgraded in the near future. This upgrade will 
not change the scaling drastically, but should make the code a 
little more efficient. 

The raytracing in parallel then consists of three functions. 
The first function looks for boundary data to receive until it has 
received data for each boundary in the list. Whenever data are 
received they are unpacked and copied into the relevant local 
boundary cells. The second function calls the serial short char- 
acteristics raytracing routine on the local domain, and the third 
packages the outbound column density data and executes a non- 
blocking send for each subdomain it needs to send data to. 



5. 1 . Static 2D and 3D models 

Two simulations were performed of the expansion of an H n re- 
gion into a uniform medium on a 2D axisymmetric grid and a 
3D Cartesian grid. The parameters are identical to that of the 
H n region in the previous section, using a multi-frequency ion- 
ising (and FUV heating) source; the 2D model has 5 12 2 cells and 
the 3D model 160 3 , and only the positive quadrant/octant of the 
domain is simulated. The model was run for 10 recombination 
times (3.861 x 10 11 s), again with no dynamics. 

The calculations were run first using one MPI process per 
physical core (denoted 'STD'), and then repeated with two MPI 
processes per core using the simultaneous multi-threading fea- 
ture of JUROPA (denoted 'SMT'). Both calculations were run 
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Fig. 8. Scaling comparison between algorithms 1 and 3 on the 
distributed memory cluster JUROPA. Above: Scaling of a 2D 
static photoionisation problem with 512 2 grid cells as discussed 
in the text. Below: Scaling of the same static problem, but now 
in 3D with 160 3 grid cells. 



using 8-512 MPI processes in STD mode, and 16-512 processes 
(on 8-256 cores) in SMT mode. For calculations using 512 pro- 
cesses the subdomain for each process is 16 x 32 cells in 2D 
and 20 3 cells in 3D. These very small subdomains are unlikely 
to be used in a production calculation because the number of 
ghost/boundary cells is comparable to the number of real cells. 

The strong scaling results are shown in Fig. [8] in terms of 
the total wall-clock time to solution, where ideal scaling would 
have a slope of -1. Raytracing using short characteristics must 
scale as N~^ 2 in 2D and iV~ 2 / 3 in 3D (where N is the number of 
cores) in the limit of small subdomains and zero communication 
time, and this is clearly shown in the scaling of Al. The reason 
for this scaling is that the rays must be traced causally outwards 
from the source, so parallelisation is always restricted in one of 
the spatial dimensions. This means that for 2D calculations only 
a ID curve of subdomains can be active at any time, and in 3D 
this is a 2D spherical shell of subdomains. The scaling of the 
raytracing algorithm follows simply from this. 

Al scales as well as it can be expected to, but there seems to 
be little advantage in running 2D calculations on large numbers 
of cores using this algorithm. The scaling stays roughly constant 
for Al out to the maximum number of cores used, although there 
is an indication that the curve is flattening further at 256 and 512 
cores. 

The scaling of A3 should be better than that of Al because 
the microphysics integrations are separate from the poorly scal- 
ing raytracing step, and so a larger percentage of the total com- 
putation can be performed fully in parallel. For the 2D problem 
the scaling of A3 is indeed far superior to Al, and it is faster 




Number of cores (N) 



Fig. 9. Scaling comparison between algorithms 1 and 3 on 
JUROPA for a 2D calculation with photoionisation and a 
1000 km s~' stellar wind, as discussed in the text. 



for all runs despite taking more timesteps (because of the dif- 
ferent timestep criterion). For the 2D problem with A3 there is 
no speedup in the raytracing when going from 64 to 128 cores 
with SMT and the overall calculation actually slows down for 
256 cores (and for 512 cores in STD mode); this may be due 
to network latency becoming a limiting factor. The total work 
is also being increased because the number of boundary cells is 
increasing to a large percentage of the total number of real cells, 
and it is more likely that this is the reason for the slowdown. All 
processes write a small log-file, so another possibility is that the 
very short jobs with hundreds of cores are affected by disk I/O 
congestion. The Al calculations are running about lOx slower, 
which means that issues such as network latency and disk I/O 
will not affect them to the same extent. In 3D the difference in 
the scaling is much less significant, and if Al could be made 
more efficient, it would be competitive with A3 even out to 512 
cores. Of course the prime motivation behind Al was originally 
to enable R-type ionisation fronts to cross many cells accurately 
in a single timestep, and with dt05 this no longer happens, so it 
is unclear why one would continue using Al unless it could be 
made sufficiently accurate with a less restrictive timestep. 

Neither algorithm scales ideally (i.e. linear speedup with in- 
creasing core count); the reason for this is clear with Al, but for 
A3 it is also true when the raytracing is taking a negligible frac- 
tion of the runtime. The reason for the less-than-ideal scaling in 
this case seems to be that the microphysics integrator does not 
have constant work per grid point, but instead the computation is 
concentrated near the ionisation front. In ionised gas very little 
is changing, and in neutral gas this is also the case, so the micro- 
physics integration is almost trivial. Within the ionisation front, 
however, the equations are stiff and an accurate solution requires 
much more computation. There is a boundary data exchange af- 
ter the microphysics update, so effectively this step is limited by 
the slowest subdomain. In principle this could be solved by ac- 
tive load-balancing, where the size of each subdomain is varied 
according to the computation time required. 

5.2. Models with a fast stellar wind 

To test the scaling in the opposite extreme where the dynam- 
ics strongly limits the timestep, a simulation was run includ- 
ing photoionisation and a stellar wind from a hot massive star. 
The scaling results of this test are representative of a produc- 
tion calculation. The star is moving with 4kms _1 through a 
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constant density neutral medium with «h = 3000 cirT 3 , and is 
emits 3 x 10 48 ionising photons per second with a blackbody 
spectrum of T — 37 500K. The stellar wind parameters are 
M = 2 x 10~ 7 Moyr _1 and v w = lOOOkms -1 . An axisymmetric 
model was run with 512 x 256 grid cells and physical domain 
z e [-1.28, 1.28] x 10 18 cm and R e [0,1.281 x 10 18 cm. The 
wind was injected following van Mari e et al.l d2006l) by impos- 
ing a freely expanding, adiabatically cooling, fully ionised wind 
within a radius of 15 cells of the origin (out to 0.75 x 10 17 cm). 
The Stromgren radius was R$ = 6.75 x 10 17 cm, and in the initial 
conditions the ambient medium was ionised out to 1.92 x 10 17 
cm, more than twice the radius of the wind boundary condition. 
This substantially reduces the time during which the ionisation 
front is R-type, and consequently for most of the calculation the 
overall timestep was set according to the Courant condition on 
the 1000 km s~' wind region and not according to the micro- 
physics timestep restrictions. The simulation was run for 10 kyr, 
or =i 245f rec . The scaling results in Fig.|9](this time only for STD 
mode with one MPI process per physical core) show similar re- 
sults to the static 2D case for Al. The scaling of A3 is somewhat 
better than for the static case, probably because each timestep 
now requires more computation and hence fewer timesteps are 
completed per second for a given number of cores. Indeed, the 
speedup is basically linear with A3 from 32 to 128 cores and, in 
terms of total core-hours required, the 256 core run still has an 
efficiency of 56% compared to the 16 core run. 

6. Discussion and conclusions 

6. 1 . Explicit algorithms 

Variants of three commonly used photoionisation tracking al- 
gorithms have been implemented and tested for their accuracy, 
efficiency and scaling properties. The first-order accurate ex- 
plicit algorithm, A2, was shown to be the least accurate and effi- 
cient when R-type ionisation fronts are present, with one of A2- 
dt08, dtll, or dtl2 required for reasonably converged solution. 
Algorithm 3 (A3) is an extension of A2 to second-order time ac- 
curacy; its advantages over A2 in terms of photon conservation 
and efficiency for different timestepping criteria are presented 
here for the first time. The results of this comparison demon- 
strate that A2 should always be rejected in favour of A3 (or a 
similar higher order integration) when an explicit algorithm is to 
be used. A sufficient timestep criterion for photon conservation 
and ionisation front tracking is A3-dt02, where the microphysics 
timestep limit is Af = 0.25 /y. This could possibly be relaxed 
to Af = 0.5/y if errors of ~ 5 per cent are considered accept- 
able, but the errors are > 10 per cent (and fairly unpredictable) 
if Af = 1.0/y is used. The performance of this timestep criterion 
in dynamical multi-dimensional simulations will be tested in fu- 
ture work. Both A2 and A3 improve in accuracy with higher cell 
optical depths, but the gain is more noticeable with A3. These 
conclusions comparing A2 to A3 hold for both monochromatic 
and multi-frequency ionising radiation. 

In comparison with this work, the algorithm of 
IWhalen & Normanl d2006h is closest to A2 in that it uses 
instantaneous column densities and is first-order accurate 
in time (for photon conservation). These authors split the 
time-integration into two steps: a full timestep where the 
dynamics and microphysics are updated, limited by the Courant 
condition and by the condition that the internal energy in 
any cell can change by at most 10%; and also a substep in 
which the raytracing is performed and the chemical network 
updated, with the sub-timestep set by the requirement that the 



electron density change by at most 10%. Each full timestep 
therefore consists of one or more substeps . Si milar timestep 
limits are also used by Krum holz et al.l d2007l) and I Wise & Abell 
(2011), with both algorithms closely related to A2 based on 
the published description. This timestep criterion is quite close 
to dtll, where the internal energy is allowed to change by at 
most 12.5% and the neutral fraction by at most 12.5% (which 
is approximately the inverse of the electron fraction). Using 
A2, it is shown here that dtll and dtl2 (as well as dt08) give 
basically converged resu lts in all situations, in agreement with 
IWhalen & Normanl (2006). In addition, it has been shown here 
that these criteria are sufficient for multi-frequency radiation 
as well as the monochromatic radiation used in these previous 
works. 

The clear advantages of A3 over A2 in terms of accuracy and 
efficiency suggest, however, that other codes could make large 
efficiency gains by switching to an algorithm similar to A3-dt02 
for the raytracing/chemistry step (or substep). Although this in- 
volves two raytracings and chemistry integrations per (sub)step 
instead of one, the much longer timestep allowed means that less 
total computation is required, and furthermore that fewer ray- 
tracings are required for a given computation. This is important 
because raytracing is th e major bottleneck for the parallelised 
AMR implementation of Wise & Abel (201 1). The improvement 
in A3 compared to A2 is independent of spatial resolution, re- 
quiring only a higher order, finite volume formulation, so it is ex- 
pected that the same gains in efficiency and accuracy presented 
here could just as easily be gained by AMR codes, although the 
parallel scaling is admittedly much more complicated. A3 can 
also be applied to photoionisation substeps by simply omitting 
the dynamics updates from the sequence of steps shown in Fig.Q] 
While the mass density field is not time-centred in a substep, the 
ion fractions and hence the column densities are, and if substep- 
ping is employed it means the density field is evolving on a much 
longer timescale anyway. 

Even with the second-order accurate A3, errors greater than 
10 per cent are obtained using dtOO w ith K\ — 1; this likely 
explains why the Riikhorst et al. (2006) flash-hc algorithm had 
difficulty tracking R-typ e ionisation fron ts accurately in the code 
comparison project of llliev et al.l d2006ah . As an explicit algo- 
rithm, it is not possible to accurately track R-type ionisation 
fronts crossing more than one optically thick cell per raytracing. 
On the other hand, D-type ionisation fronts are subsonic by def- 
inition (with respect to at least one of the gas components), and 
so the Courant condition automatically imposes K4 < 1, leading 
to accurate ionisation front propagation. 

6.2. Implicit algorithm 

Implicit methods have, in principle, a clear efficiency advantage 
over explicit methods for R-type ionisation fronts, at least for 
monochromatic radiation, although the timestep does need to 
be restricted to a fraction of f rec . For Al, which is similar bu t 
not identical to the C 2 -ray method of lMellemaet all (|2006b), 
ionisation fronts are tracked with negligible error for tests with 
monochromatic radiation and no recombinations, shown in sec- 
tion [3] When recombinations are switched on the accuracy is 
somewhat lower, but the expansion of an H 11 region is tracked 
by a single timestep to f = 0.1f rec with < 1.5% error (with cri- 
terion dt02). For this reason Al-dt02 is a vastly more efficient 
algorithm than A2 or A3 for tracking R-type ionisation fronts 
with monochromatic radiation. 

The situation is not so simple with multi-frequency radiation. 
For a given emitted radiation spectrum, the transmission of pho- 
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tons through a cell depends not only on the optical depth of the 
cell, but also on the optical depth from the source to the cell (be- 
cause this changes the incident radiation spectrum). In addition, 
when the optical depth within a cell changes by a value » 1 
over a timestep, the transmitted spectrum also changes signifi- 
cantly, and it is no longer clear that a time-averaged optical depth 
(or equivalently attenuation fraction at a specific frequency) will 
give good photon conservation. This is borne out in the results 
from section |4] where Al performs well when cell optical depths 
are Atq < 3, but the accuracy decreases as Aro increases for 
dt00-dt04. To achieve good accuracy for all densities and values 
of Ato one of dt05-dtl2 must be used, removing entirely the ef- 
ficiency advantage of Al over A3. Here it is suggested that dt05 
represents the best balance of accuracy and efficiency, although 
it remains significantly less efficient than A3-dt02. 

No attempt has been made to modify Al to produce better re- 
sults with multi-frequency radiation. It is possible that choosing 
more carefully the frequency at which the time-averaged atten- 
uation fraction is calculated would give a more accurate result, 
and it is also possible that modifying the time-averaging strategy 
would improve photon conservation. It is also possible that the 
C 2 -ray method (which does have a different time-averaging strat- 
egy) is already more accurate than Al with multi-frequency ra- 
diation. Testing these hypotheses is, however, beyond the scope 
of this work. 



6.3. Parallel scaling 

In the limit of many processors with small simulation domains 
the time for the raytracing step scales with N~ 1 ^ 2 in 2D calcu- 
lations and N~ 2 ^ in 3D (where N is the number of cores) when 
using the method of short characteristics to trace rays from point 
sources in Cartesian geometry. This is because the rays must be 
traced outwards from the source in sequence, so only a 2D sur- 
face (or ID curve) of subdomains can be active at any time in a 
3D (or 2D) raytracing. The scaling of Al closely follows these 
power laws, both for static and dynamical simulations. A3 scales 
significantly better than Al in 2D calculations and somewhat 
better in 3D; this is a consequence of having a smaller fraction 
of the total computation in the raytracing step. The scaling of A3 
is less than ideal even when raytracing is not a limiting factor, 
and this is likely due to imbalances in the work required for the 
microphysics integration (where most computation is required 
near the ionisation front). Despite this, the efficiency of A3 in 
2D and 3D remains above 50% on up to > 100 cores for the 
test calculations run here (up to 256 cores for the test including 
dynamics). 

The scaling of A2 should be the same as A3, but it is less 
efficient and less accurate, so it was not tested here. The result 
that A3 is here always more efficient than Al for all N is cer- 
tainly implementation-dependent, and if a version of Al could 
be devised that allows (for example) timestep criterion dt02 to 
be used, then Al would suddenly become much more efficient 
by virtue of requiring many fewer timesteps than A3. For gen- 
eral problems, however, A3 has a higher order of accuracy and 
better scalability than Al and so may be more suitable for par- 
allel simulations where photoionisation has a strong effect on 
gas dynamics. In addition, the simplest possible non-equilibrium 
chemistry model has been used here; the scaling advantage of A3 
should be more important when more computation is required in 
the chemistry/thermal physics source term integration by e.g. the 
inclusion of He and H2. 
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